Joint interpretation of input speech and pen gestures for multimodal human-computer interaction
نویسندگان
چکیده
This paper describes out initial work in semantic interpretation of multimodal user input that consist of speech and pen gestures. We have designed and collected a multimodal corpus of over a thousand navigational inquiries around the Beijing area. We devised a processing sequence for extracting spoken references from the speech input (perfect transcripts) and interpreting each reference by generating a hypothesis list of possible semantics (i.e. locations). We also devised a processing sequence for interpreting pen gestures (pointing, circling and strokes) and generating a hypothesis list for every gesture. Partial interpretations from individual modalities are combined using Viterbi alignment, which enforces the constraints of temporal order and semantic compatibility constraints in its cost functions to generate an integrated interpretation across modalities for overall input. This approach can correctly interpret over 97% of the 322 multimodal inquiries in our test set.
منابع مشابه
Flexible Speech and Pen Interaction with Handheld Devices
An emerging research direction in the field of pervasive computing is to voice-enable applications on handheld computers. Map-based applications can benefit the most from multimodal interfaces based on speech and pen input and graphics and speech output. However, implementing automatic speech recognition and speech synthesis on handheld computers is constrained by the relatively low computation...
متن کاملChapter to appear in Handbook of Human-Computer Interaction, (ed
Multimodal systems process two or more combined user input modes— such as speech, pen, touch, manual gestures, gaze, and head and body movements— in a coordinated manner with multimedia system output. This class of systems represents a new direction for computing, and a paradigm shift away from conventional WIMP interfaces. Since the appearance of Bolt’s (1980) “Put That There” demonstration sy...
متن کاملComplementarity and redundancy in multimodal user inputs with speech and pen gestures
We present a comparative analysis of multi-modal user inputs with speech and pen gestures, together with their semantically equivalent uni-modal (speech only) counterparts. The multimodal interactions are derived from a corpus collected with a Pocket PC emulator in the context of navigation around Beijing. We devise a cross-modality integration methodology that interprets a multi-modal input an...
متن کاملOn Multimodal Route Navigation in Pdas
One of the biggest obstacles in building versatile natural human-computer interaction systems is that the recognition of natural speech is still not sufficiently robust, especially in mobile situations where it's almost impossible to cancel out all irrelevant auditory information. In multimodal systems the possibility to disambiguate between several input and output modalities can substantially...
متن کاملAn Overview of Multimodal Interaction Techniques and Applications
introDUCtion Desktop multimedia (multimedia personal computers) dates from the early 1970s. At that time, the enabling force behind multimedia was the emergence of the new digital technologies in the form of digital text, sound, animation, photography , and, more recently, video. Nowadays, multimedia systems mostly are concerned with the compression and transmission of data over networks, large...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006